ADAPTIVE DENSITY DECONVOLUTION WITH DEPENDENT INPUTS 



F. COMTE*'\ J. DEDECKER^ AND M. L. TAUPIN ^ 

Abstract. In the convolution model Zi = Xi + Si, we give a model selection procedure to estimate 
the density of the unobserved variables (Xi)i<i<„, when the sequence {Xi)i>i is strictly stationary 
but not necessarily independent. This procedure depends on wether the density of Si is super smooth 
or ordinary smooth. The rates of convergence of the penalized contrast estimators are the same as in 
the independent framework, and are minimax over most classes of regularity on R. Our results apply 
to mixing sequences, but also to many other dependent sequences. When the errors are super smooth, 
the condition on the dependence coefficients is the minimal condition of that type ensuring that the 
sequence {Xi)i>i is not a long-memory process. 
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1. Introduction 

The problem of estimating the density of identically distributed but not independent random vari- 
ables Xi , . . . , Xn when they are observed with an additive and independet noise is encountered in 
numerous contexts. This problem is described by the model 

(1.1) Zi = Xi + ei, for i = 1, . . . ,n, 

where one observes Zi, . . . , Z„, and where {£i)i<i<n are independent and identically distributed (i.i.d.), 
and independent of (Xj)i<j<„. When (Xj)j<i<„ is a Markov chain, the model Hl.ll) is a particular 
case of hidden Markov models, with an additive structure. 

Our aim is the adaptive estimation of g, the common distribution of the unobserved variables 
(Xj)i<j<„, when the density fs of is known. More precisely we shall build an estimator of g without 
any prior knowledge on its smoothness, using the observations {Zi)i<^Kn and the knowledge of the 
convolution kernel fe- We shall assume that the known density belongs to various collections of 
densities, and that the dependence properties of the sequence (Xj)j>i are described by appropriate 
dependence coefficients. More precisely, we consider two types of dependent sequences. We assume 
either that the sequence (Xj)j>i is absolutely regular in the sense of Rozanov and Volkonskii H196U() . 
or that it is r-dependent in the sense of Dedecker and Prieur (|2UU5() . These dependence conditions 
are presented in Section |21 and motivated through various examples. 

In density deconvolution, two factors determine the estimation accuracy. First, the smoothness 
of the density g to be estimated, and second the smoothness of the error density, the worst rates of 
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convergence being obtained for the smoothest errors densities. We shall consider two classes of densities 
for /gi first the so called super smooth densities with exponential decay of their Fourier transform, 
and next the class of ordinary smooth densities with Fourier transform having a polynomial decay. 

Let us briefly recall the previous results in the independent framework. To our knowledge, the 
first adaptive estimator has been proposed by Pensky and Vidakovic ( 1999,) . It is a wavelet estimator 
constructed via a thresholding procedure. This estimator achieves the minimax rates when g belongs 
to a Sobolev class, but it fails to reach the minimax rates when both the errors density and g are 
supersmooth. More recently, Comte et al. (|2nn6j) have proposed an adaptive estimator of g constructed 
by minimizing an appropriate penalized contrast function only depending on the observations and on 
fs- This estimator is minimax (sometimes within a negligible logarithmic factor) in all cases where 
lower bounds are previously known (i.e. in most cases). More precisely, the authors obtain non- 
asymptotic upper bounds for the Mean Integrated Square Error (MISE), which ensure an automatic 
trade-off between a bias term and the penalty term. Hence, the estimator automatically achieves the 
best rate obtained by the collection of non-penalized estimators when the (unknown) optimal space 
is selected (sometimes up to a negligible logarithmic factor). When both the density and the errors 
are super smooth, this adaptive estimator significantly improves on the rates given by the adaptive 
estimator built in Pensky and Vidakovic ((TO99), whereas both adaptive estimators have the same rate 
in the other cases. This improvement partly comes from the choice of the Shannon basis (see Section 
13. 2|) instead of the wavelet basis considered in Pensky and Vidakovic. 

In the dependent context, we follow the approach proposed in Comte et al. H2UU6() . We give adaptive 
estimators of g, constructed by minimizing an appropriate penalized contrast function. The penalty 
function depends on the known density /g, but it does not depend on the dependence coefficients 
of the sequence (Xj)j>i. The adaptive estimators have the same rates as in the independent case, 
under mild conditions on the dependence coefficients of (Xj)j>i. The important point here is that 
the penalty functions are the same (or almost the same) as in the independent framework. This is a 
bit surprising: indeed, when the (Xi)i<i<„ are observed (i.e. = 0), the threshold level proposed in 
Tribouley and Viennet (|1998|1 as well as the penalty function given in Comte and Merlevede (|2()()2jl 
(see also our Corollarv l5.2j) depends on the mixing coefficients of the sequence (Xj)j>i. 

In Section 0] we deal with non adaptive estimators. As usual, we show that the MISE of the minimum 
contrast estimator is bounded by a squared bias plus a variance term. The variance term can be split 
into two terms. The first and dominating term of the variance is exactly the variance of a density 
deconvolution estimator in the independent context. It is as usual related to J^^^^q \ fe '^^ ^ 
C„ — > oo. The second and negligible term in the variance is the term involving the dependence 
structure of the sequence (Xj)j>i. The main consequence of this first result is that this non adaptive 
estimator reaches the (minimax) rates of the i.i.d. case (as given in Fan (|1991j) . Butucea (|2f)n4j) . and 
Butucea and Tsybakov (|2nn5)l ). as soon as the dependence coefficients are summable. Moreover, even 
if the coefficients are not summable, there is no loss in the rate provided that the partial sums of 
the coefficients does not grow too fast with respect to J^^^^q \fe(^)\^'^^^- These results have to be 
compared with previously known results for non adaptive density deconvolution in dependent contexts. 
For strongly mixing sequences in the sense of Rosenblatt H1956|) . Masry (|1993|1 propose a kernel- type 
estimator for the joint density gp of {Xi, . . . ,Xp) when it exists. For the (pointwise) Mean Square 
Error, he obtains the same rates as in the i.i.d. case provided that a{n) = 0{n^^~^) for ordinary 
smooth /e, and provided that a{n) = 0{n^^^^) for super smooth f^. When p = 1, our assumption on 
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the mixing coefficients is weaker, since we only need X]n>o'^(^) < oo in both cases (see our Remark 



In the main part (Section 1^1), we study the adaptive estimators. We show that the squared bias 
term and the variance term obtained in the upper bound of the MISE of the adaptive estimator are 
the same as in the independent case. The model selection procedure depends on wether the density 
fs is super smooth or ordinary smooth. 

When is super smooth, the adaptive estimator, is constructed with the exact penalty of the 
independent context. Its rate of convergence is exactly the same as in the independent case, provided 
that the dependence coefficients of (Xj)j>i are summable. The main tools in this case are covariance 
inequalities for dependent variables, and concentration inequalities. The case of super smooth errors 
is particularly important, since it contains the case of Gaussian errors. It also contains the stochastic 
volatility model, in which Si ~ ln(AA(0, l)^) (see Van Es et al. f|2nn3. 2005.,) . Comte (.2004.1 . Comte and 
Genon-Catalot (200S1)). 

When /g is ordinary smooth, the adaptive estimator, is constructed with a penalty of the same order 
as in the independent context. Its rate of convergence is exactly the same as in the independent case. 
For ordinary smooth errors, the main tools are the coupling properties of the dependence coefficients 
(see Section ITT]) . To use these properties, we need to consider a more restrictive type of dependence 
than for super smooth errors, and we need to impose a polynomial decrease of the coefficients. 

In both cases, super and ordinary smooth, the results hold for /3-mixing and r-dependent random 
variables (Xj)j>i. To our knowledge, this is the first time that adaptive density deconvolution in a 
dependent context is considered. The robustness of this estimation procedure to dependency strongly 
use the independence between (Xj)i<i<„ and (ei)i<i<n, and the fact that the errors are i.i.d. random 
variables. We refer to Comte et al. (2005, 2006) for practical implementation of the estimators, and 
for the calibration of the constants in the penalty functions. In Comte et al. (200^, the robustness of 
the procedure to various dependency has been experimented in practice (see Tables 4 and 5 therein). 



Let (r2,^,P) be a probability space. Let y be a random variable with values in a Banach space 
(B, II • ||b), and let be a fi-algebra of A. Let Py|^ be a conditional distribution of Y given Ai, and 
let Py be the distribution of Y . Let ^(B) be the borel a-algebra on (B, || • ||b), and let Ai(B) be the 
set of 1-Lipschitz functions from (B, || • ||b) to M. Define now 



The coefficient f3{M,a{Y)) is the usual mixing coefficient, introduced by Rozanov and Volkonskii 
H1960|) . The coefficient t{M,Y) has been introduced by Dedecker and Prieur (,2005,) . 

Let X = {Xi)i>i be a strictly stationary sequence of real-valued random variables. For any /c > 0, 
the coefficients /3x,i(/s) and tx,i(A;) are defined by 



2. Some measures of dependence 



and if E(||y||) < oo, t{M,Y) = E 





/?x,i(A:) 

and if E(|Xi|) < oo, rx,i(A:) 



/3(a(Xi),a(Xi+,,)), 
T{a{Xi),Xi+k). 
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On M', we put the norm \\x — y\\^i = l~^{\xi — yi| + • • • + |x; — yi\). Let A4i = cr{Xk, 1 < k < i). The 
coefficients /3x,oo(^) and tx,oo(^) are defined by 

(3x.oo{k) = sup sup{(3{Mi,a{Xi^, . . . ,Xi^)),i + k < ii < ■ ■ ■ < ii} , 
j>i,/>i 

and if E(|Xi|) < oo, tx,oo(^) = sup sup {T{Mi, {Xi^, Xi^)),i + k < h <■■■< ii} . 

j>i,/>i 

Let Qx be the generalized inverse of the tail function x ¥(\Xi\ > x). We have the inequalities 

(2.1) Tx,i(/c)<2/ Qx{u)du and rx,oo(A:) < 2 / Qx(n)(in. 

JO JO 

2.1. Coupling. We recall the coupling properties of these coefficients. Assume that J7 is rich enough, 
which means that there exists U uniformly distributed over [0, 1] and independent ol MM a{X). There 
exist two Vc7(C/) Vcj(X)-measurable random variables X^ and distributed as X and independent 
of M such that 

(2.2) (5{M,a{X)) =¥{X ^ X*^) and t{M,X) =¥.{\\X - X*M . 

The first equality in (|2.2j) is due to Berbee p979|l , and the second one has been established in Dedecker 
and Prieur pnOBf) . Section 7.1. 

2.2. Covariance inequalities. Denote by || • ||oo,p the L°°($7, P)-norm. Let X,Y be two real-valued 
random variables, and let /, h be two measurable functions from M to C. Then 

(2.3) |Cov(/(y),Mx))| <2||/(y)|U,p||M^)lloo,p/?(a(x),a(y)), 

and if Lip(/i) is the Lipschitz coefficient of h, 

(2.4) |Cov {f{Y)MX))\ < ||/(y)||oo,pLip(/i)r(a(y),X). 

Inequalities (|2.3|) and 1)2. 4() follow from the coupling properties 1)2. 2() by noting that if X* is distributed 
as X and independent of Y, 



Gov (/(y), hiX)) = EifiY)ih{X) - h{X*))) . 

2.3. Examples. Examples of /3-mixing sequences are well known (we refer to the books by Doukhan 
(|1994|) and Bradley ( 2002^1 ). One of the most important examples is the following: a stationary, 
irreducible, aperiodic and positively recurrent Markov chain (Xj)j>i is /3-mixing, which means that 
/3x,oo (k) tends to zero as k tends to infinity. 

Unfortunately, many simple Markov chains are not /3-mixing (and not even strongly mixing in the 
sense of Rosenblatt (|1956() ). For instance, if (ei)i>i is i.i.d. with marginal ;S(l/2), then the stationary 
solution {Xi)i>Q of the equation 

1 

(2.5) Xn = 2^^"--^ + ^")' "'^0 independent of (ej)i>i 

is not /3- mixing (and not even strongly mixing) since /3x,i(^) = 1 for any A; > 0. By contrast, for this 
particular example, one has tx,oo(^) < 2"^^. More generally, the coefficient rx,oo(^) is easy to compute 
in many situations (see Dedecker and Prieur f2005 ) ). Let us recall some important examples: 
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Linear processes. Assume that Xi = Y2j>oO'jCn-j, where is i-i-d. One has the bounds 



Tx,oo{k) < 2E(|Co|) Yl and rx,oo(A:) < /2Var(Co) 

j>k y j>k 

Markov chains. Let (X„)„>o be a stationary Markov chain such that X„ = F{Xn~i,Cn) for some 
measurable function F and some i.i.d. sequence {(,i)i>i independent of Xq. Assume that there exists 
At < 1 such that 

Ei\F{x,Co) - Fiy,Co)\) < ti\oc - y\ . 

Then one has the inequahty 

Tx,oo(A:) < 2E(|Xo|)/^^ 
An important example is Xn = f{Xn~i) + ^or some K-lipschitz function /. 

Expanding maps. Let T be a Borel-measurable map from [0,1] to [0,1]. If the probability 
is invariant by T, the sequence (Yi = T*)j>o of random variables from ([0,1], /i) to [0,1] is strictly 
stationary. Define the operator K from L^([0, l],/i) to L"'^([0, l],/i) via the equality 

{Kh){x)k{x)fi{dx) = I h{x){koT){x)ii{dx) 
Jq 

where h G and k G L°°([0, 1], /i). It is easy to check that (Yi,y2) • • • has the same 

distribution as . . . where (Xj)^^^ is a stationary Markov chain with invariant distri- 

bution /i and transition kernel K. If T is uniformly expanding (see for instance the assumptions on 
page 218 in Dedecker and Prieur H2005|) ^. then there exist C > and p in ]0, 1[ such that 

rx,oo(fc) < Cp^ 

(see Dedecker and Prieur page 230). Note that the Markov chain (Xj)j>i is not /3-mixing (and not 
even strongly mixing). Indeed P{a{Xi),(j{Xn)) = (3ia{T'^), a{T)). Since cj(r") C a{T), it follows 
that 

/3(a(Xi),a(X„)) > /3(cT(r-),a(r")) = /3(cT(r),a(r)) 
and the later is positive as soon as is non trivial. 

3. Assumptions and estimators 



For two complex- valued functions u and v in L2(M) H Li(M), let 

u*{x) = J e^^^u{t)dt, u*v{x) = J u{y)v{x — y)dy, and <u,v>= j u{x)v{x)dx 
with 1 the conjugate of a complex number z. We also use the notations 



M\i 



u{x)\dx, = / \u{x)\^dx, and ||ti||oo = sup |n(x)|. 
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3.1. Assumptions for density deconvolution. The smoothness of fe is described by the following 
assumption. 



(Af) 
(A|) 



There exist nonnegative numbers kqjT) A^i and 5 such that /* satisfies 
Ko(x2 + l)-T/2 exp{-n\xf} < \ f*{x)\ < k'q{x^ + 1)-t/2 exp{-ij.\xf}. 
The density fe belongs to L2(M) and for all xeR, f*{x) / 0. 



When 6 



Since fe is known, the constants fi, 6, kq, and 7 defined in (Xf ) are also known 
in (Xf I, /e is usually called "ordinary smooth' 

with 5 



"super smooth". Densities satisfying (Xf 



When fi > and 5 > 0, is called 
> and ^ > are infinitely differentiable. The 
standard examples for super smooth densities are the following: Gaussian or Cauchy distributions are 
super smooth of order 7 = 0, 5 = 2 and 7 = 0, (5 = 1 respectively. When e = ln(?7^) with 77 ~ Af{0, 1) 
as in Van Es et al. H2UU31 l?0U5|l . then e is super-smooth with = 1,7 = and /x = 7r/2. For ordinary 
smooth densities, one can cite for instance the double exponential (also called Laplace) distribution 
with (5 = = /i and 7 = 2. Although densities with 5 > 2 exist, they are difficult to express in a closed 
form. Nevertheless, our results hold for such densities. Furthermore, the square integrability of fe in 
require that 7 > 1 /2 when = in ( ) . 



Classically, the slowest rates of convergence for estimating g are obtained for super smooth error 
densities. In particular, when e is Gaussian and g belongs to Sobolev classes, the minimax rates are 
negative powers of ln(n) (see Fan ()199ip ). Nevertheless, the rates are improved if g has stronger 
smoothness properties, described by the set 

(3.1) 5,,^,b(Ci) = jV' such that \iIj*{x)\^{x^ + iy e-xp{2b\x\'']dx <Ci\ 

J —00 

for s,r,h non- negative numbers. 

Such smoothness classes are classically considered both in deconvolution and in density estimation 
without errors. When r = 0, (|3.1|1 corresponds to a Sobolev ball. The functions in (|3.1|) with r > 
and 6 > are infinitely many times differentiable. They admit analytic continuation on a finite width 
strip when r = 1 and on the whole complex plane if r = 2. 

Subsequently, the density g is supposed to satisfy the following assumption. 



(A 



3 ) 



The density g G L2(M) and there exists M2 > 0, such that / x^g^{x)dx < M2 < 00. 



Assumption (A3 ) which is due to the construction of the estimator, is quite unusual in density 



estimation. It already appears in density deconvolution in the independent framework in Comte et 
al. (|2UU5l I2U06|) . I t als o appears in a slightly different way in Pensky and Vidakovic fl99 91 w ho 
assume, instead of ( A^ ) that sup^-gi^ |x|5(x) < 00. It is important to note that Assumption ( A;^ 
very unrestrictive. 



IS 



All densities having tails of order as x tends to infinity satisfy (A3' ) only if s > 1/2. One 

can cite for instance the Cauchy distribution or all stable distributions with exponent r > 1/2 (see 
Devroye (|1HS£I))- The Levy distribution, with exponent r 



1/2 does not satisfies ( A^ 



3.2. The projection spaces. Let ip{x) = sin(7rx)/(7rx). For m G N and j G 

^/rrnp{mx — j). The functions {(/3m,j}jeZ constitute an orthonormal system in L^(l 



, set '^m,j{x) = 
(see e.g. Meyer 



(|1990|) . p. 22). For m = 2'^, it is known as the Shannon basis. Though we choose here integer values 
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for m, a thinner grid would also be possible. Let us define 

Sm = span{99^^., j e Z}, me N. 

The space Sm is exactly the subspace of L2 (M) of functions having a Fourier transform with compact 
support contained in [— 7rm,7rm]. 

The orthogonal projections of g on Sm is Qm = ^j(zz^rn,j{g)Vm,j where am,j{g) =< <Pm,j,g >■ To 
obtain representations having a finite number of "coordinates", we introduce 

=span{ipm,j, \j\ < kn} 

with integers /c„ to be specified later. The family {^m,j}\j\<kn orthonormal basis of Sm^ and the 
orthogonal projections of g on Sm^ is given by gin^ = J2\j\<k„ am,j{9)^m,j- 



3.3. Construction of the minimum contrast estimators. For an arbitrary fixed integer m, an 

(n) 

estimator of g belonging to Sm is defined by 
(3.2) 5^^) =arg min 7n(t), 



where, for t in Sm\ 



(^) = ^E[NI'-K(^.)], with M^) = ^{j^) 



7i 

n 

i=l 

By using Parseval and inverse Fourier formulae we obtain that E ['uJ'(Zj)] = {t,g), so that E(7„(t)) = 
11^ ~ ~ IblP is minimal when t = g. This shows that 7n(t) suits well for the estimation of g. 
Classical calculations show that 

1 " 

9m^ = X] ""m-J^mj With ttmj = -'^u*^^ ^{Zi), and E{am,j) =< 9, y^mj >= am,j ■ 

\j\<kn «=1 

3.4. Minimum penalized contrast estimator. As in the independent framework, the minimum 
penalized estimator of g is defined as ^ = gmg where rhg is chosen in a purely data-driven way. The 
main point of the estimation procedure lies in the choice of m = rhg for the estimators gm from Section 
13.31 in order to mimic the oracle parameter 



(3.3) rhg = argminE \\ gm - g h ■ 

m 

The model selection is performed in an automatic way, using the following penalized criteria 

(3.4) g = g';^^ with rh = avg min 7„(g(^)) + pen(m) , 

mejl,--- ,m„} I J 

where pen(m) is a penalty function, precised in the Theorems, that depends on /* through A{m) 
defined by 
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The key point in the dependent context is to find a penalty function not depending on the mixing 
coefficients such that 

¥.\\g-gf<C inf E \\ - g f . 

me{l,--- ,m„} 

(n) 

4. Risk bounds for the minimum contrast estimators gin 
We focus here on non adaptive estimation, starting with the presentation of general upper bounds 

^ (n) 

for MISEs of the minimum contrast estimators gm ■ 



Proposition 4.1. // JAlp and { A^ ) hold, then 

Kn n n 

where 

-1 ^ pnm 

(4.1) Rm = -^ I Gov (e^^^i, e^^^'=) | dx. 

^ k=2 

Moreover, Rm < inin{Rm,/3, Rm,T), where 

n—l n~l 

Rm,p = 4m ^ /3x,i (k) and Rm,r = T^rn^ ^ "^x,! (fe) ■ 

k=l k=l 

Remark 4.1. The term Rm can be easily bounded for many other dependent sequences. For instance, 
if cex,i = ct((T(Xi), (T(Xi_|_fc)) is the usual strong mixing coefficient of Rosenblatt (|1956j) . one has 
the upper bound R^ < 16?ti. X^^Zi ax,i(^)- If X is a stationary sequence of associated random 
variables (see Esary et al. (|1967j) for the definition), then |Cov(e"^\ e*""^*)] < 4x^Cov(Xi, Xfc), so 
that Rm < (87r^/3)m^ ^^^2 Cov(Xi, Xfc). For more about density deconvolution with associated 
inputs, we refer to the paper by Masry ( |Masry 2003 1 . 

We now comment the rates resulting from Proposition 14.11 As usual, the variance term n^^A(m) 
depends on the rate of decay of the Fourier transform of /g. According to Lemma 17.21 and according 
to Butucea and Tsybakov (200^, under ( |A|[ )-( |X|| ), we have 

Ai(/„K^,)r(m)(l + o(l)) < A(m) < Ai(/„Ko)r(m)(l + o(l)) as m ^ oo 

(4.2) where r(m) = (1 + (7rm)2)^(7rm)^~^ exp |2//(7rm)'^} , 

1 

(4.3) Ai(/e, kq) = ^2^^^^ , and R{n, 6) = n{s=o} + 2^5n|5>o}- 



If (j^-(j^ and ( |aJ| ) hold, and if kn > n, we have the upper bound 



^(n)||2/„„ „ „2 , m\M2 + l) , 2Ai(A,Ko)r(m) , 2i?, 



(4.4) E\\g - 5^ r <\\9-9mr+ ' + " " + 

n n n 

Finally, since gm is the orthogonal projection of g on 5m, we get that gm = g*^[-m-K,m-K] and therefore 

II Il2 -'^ II * * l|2 ^ ( I *|2/ \ r 

\\9-9m\\ = TrWd - 9ni\\ = I \9 \ [x)dx. 



27r"" 27r 



\x\>iT'm 
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If g belongs to the class Ss,r,b{Ci) defined in ^A.l\f . then 



\g - gmf < ^{m'TT^ + 1)-^ exp{-267r'^m'-}. 
zvr 



Hence, according to (|4.4I) . if (A? ) holds and kn > n, the risk of cjm^ is bounded by 



Ci, 2 s our 2Ai(/e,A^o)(l + (7rm)2))^(7rm)i-^exp{2/.7r^m^} 

-(m vr +1) expj— iovr m ) + 



27r n 



_^ m2(M2 + 1) _^ 2ii„ 



n n 



Assume now that either X]fc>o < oo or X]fc>o < oo, so that the residual terms 
n~^Rm + n~^m?{M2 + 1) are of order n^^m?. As in the independent case, we choose m as the 
minimizer of 



(m^vr^ + 1)"^ exp{-26^'^m''} + 



n 

The behavior of m is recalled in Table 1. We see that in all cases, the residual terms n~^R^ + 
n~^rh?{M2 + 1) of order n~^m? are negligible with respect to the main terms since n~^A{m) grows 
faster than n~^'m? (recall that if (5 = 0, we have the restriction 7 > 1/2 (cf. Section [3. Hence the 

rate of convergence of is the same as in the i.i.d. case (see Table 1 below). 



Table 1. Choice of rh and corresponding rates under Assumptions (Xf (-(Xf I and 



fe 

ordinary smooth supersmooth 

7rm = 0(ni/(2''+27+i)) ttto = [ln(n)/(2^ + 

Sobolev(.) ^^^^ = 0(„-2./(2.+2,+i)) ^^tg ^ 0((ln(n))-2^/^) 

minimax rate minimax rate 



g nm — [ln(n)/26]"^^'^ m solution of 

r > /ln(n)(27+i)/'-\ m^^+^^+i-'- exp{2/i(7rm)'5 + 267rW} 

= O ) = 0{n) 

minimax rate minimax rate if r < S and s = 



When r > 0,6 > the value of rh is not explicitly given. It is obtained as the solution of the 
equation 

^2s+27+i-r g^p|2^(^^)<5 ^ 267r'^m''} = 0{n). 

Consequently, the rate of g^^ is not explicit and depends on the ratio r/S. li r/5 or d/r belongs to 
]k/{k + 1); {k + l)/{k + 2)] with k integer, the rate of convergence can be expressed as a function of k. 
We refer to Comte et al. H2UU6() for further discussions about those rates. We refer to Lacour (,2006)1 
for explicit formulae for the rates in the special case r > 0, 5 > 0. 
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5. Risk bounds for adaptive estimators 



In the previous section, the construction of the estimators require the knowledge of the smoothness 
of g. We now come to adaptive estimation, without such prior knowledge. 

5.1. A first bound in adaptive density deconvolution. Theorem 15.11 gives a general bound which 
holds under mild dependence conditions, for being either ordinary or super smooth. For a > 1, let 
pen(m) be defined by 



(5.5) 



pen(m) 



24a^^ if < 5 < 1/3, 



n 



I ^ ^ 48^7r'^A2(/.,ACo) ^^ A(m) ^min((3V2-l/2)+.^)) ^ ^ 



Ai(/£,Koj / n 
The constant \i{fe, hq) is defined in (|4.3|) and X2{fe, i^o) is given by 
(5.6) 



Mfe, l^o) =11 fe II V2Ai(/£,Ko)I[o<5<i + 2Ai(/e, «:o)I[5>i- 
In order to bound up pen(m), we impose that 

nV(27+i) if 5 = 

'ln(n) ' 



(5.7) 



TTrrir, < < 



ln(n) 27 + 1 - (5 , 
H ^— In 



2/x 



i/s 



if J > 0. 



2/i ' 2(5/z 

Subsequently we set 

(5.8) Ka = {a + l)/{a — 1), and = max(K^, 2Ka). 

Theorem 5.1. Assume that fe satisfies fAf p - fAfl ), that g satisfies |A^[ 

Consider the collection of estimators gm^ defined by kS. gj) with kn > ?t. and 1 < m < rUn- Let pen(m) 
be defined by \5.^) . The estimator g = g~J^ defined by satisfies 



an. 



d that m„ satisfies jS. 7| ). 



m9-~9f)<Ca inf 



, ,,2 , , m2(M2 + l) 

l^-fi'mll +pen(m) + - 



n 



+ 



n 



m£{l,--- ,mn} 

where Rm is defined in J-^^. j| ), is defined in i5.^) . and C is a constant depending on f^ and a. 

Let us compare the rate of g with the rate obtained in the independent framework. The term 
infmeji,... [II5 — ffmlP + pen(m) + m^(M2 + l)/n] corresponds to the rate of g when all variables 
are i.i.d. The dependent context induces the additional term n~^{Rm„ + rnn)- If the dependence 
coefficients are summable and the errors are super smooth, then n~^{Rm„ + rnn) is negligible and g 
achieves the rate of the independent framework. If e is ordinary smooth, the term n~^{Rm„ + m„) 
may not be negligible and Theorem 15. II is not precise enough. 

5.2. Adaptive density deconvolution for super smooth f^. If (Xf l-lXf) hold for some 6 > 0, 
we have the following corollary. 



Corollary 5.1. Assume that f^ satisfies jA^[ )-( [AH ) with 6 > 0, that g satisfies |A£J), and that nin 

satisfies \5. 7| ). Let pen(m) be defined by \5.5\) . Consider the collection of estimators gm'^ defined by 
is. with kn > n and 1 < m < m„. 
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(1) V Ylk>o f^^,^i^) ^ estimator g = gj^ defined by \'J.4\) satisfies 



< 



Ca inf 



5 -5m +pen(m)H 

n 



+ 



C(ln(n))i/'5 



where Ca is defined in \5.t^) and C is a constant depending on f^, a and X^fc>o /5x,i(^)- 



(2) If J2k>o'^^,^(^) ^ estimator g = g^^ defined by satisfi< 



es 



mg 



~gU2 



) < 



Ca inf 



,,2 , , m2(M2 + 1) 

■5m|| +pen(m)H 

n 



+ 



C(ln(n))2/'5 



where Ca is defined in 115. 8\) and C is a constant depending on f^, a and X^fc>o ''"x,i(fc)- 

Corollary 15.11 requires important comments. The terms involving power of ln(n) are negligible 
with respect to infme{i,--- — ffmP + pen(m) + m?{M2 + ^)/n]. The risk of g is of order 
inf^gji ... [||(7 — (7m|P + pen(m)], that is of the best order, as in the independent framework. The 
penalty does not depend on the dependence coefficients and is the same as in the independent frame- 
work. 

As a conclusion, we see that the adaptive estimator g built with the same penalty as in the inde- 
pendent framework, still achieves the best rates under mild conditions on the dependence coefficients. 

5.3. Adaptive density deconvolution for ordinary smooth /g. For a > 1, define pen(m) by 

25aA(m) 



(5.9) 



pen(m) 



n 

with 5 



Theorem 5.2. Assume that satisfies |Af [ j- JXIP with (5 = 0, that g satisfies \ ), and that rrin 

satisfies \5. 7\ ). Let pen(m) be defined by k5.y\) . Consider the collection of estimators cjm^ defined by 
h3.'J\) with kn > n and 1 < m < m„. 

(1) IfPx,oo{k) = 0(/i:"(^+^)) for some 6 > {2-f + 3)/{2-f + 1), then the estimator g = g'^f' defined 
by \':i.4^ satisfies 



(5.10) 



~g\? 



) < Ca inf 

m£{l,--- ,mn} 



\g - S'mlP + pen(m) + 



m^(M2 + 1) 



n 



+ 



C 



n 



where Ca is defined in 115. 8\) and C is a constant depending on f^, a, and X]fc>o '^x,oo(^)- 



(2) //tx,oo(A:) = 0(A;-(i+^)) for some 6 > {2-f + 5)/ {2-/ + 1), then the estimator g = g^^ defined 
by JA'.^I ) satisfies \5.1U\) . where C is a constant depending on fs, a and Ylk>o'^^,oo{k). 

Remark 5.1. Note that the condition for /3x,oo(fc) is realized for any 7 > 1/2 provided 9 > 2. In the 
same way, the condition for Tx,oo(fc) is realized for any 7 > 1/2 provided ^ > 3. In both cases, the 
condition on 6 is weaker as 7 increases. In other words, the smoother is fs, the weaker is the condition 
on the dependence coefficients. 

Remark 5.2. For m large enough, the penalty function given for ordinary smooth errors in Theorem 
15.21 is an upper bound of more precise penalty functions which depend on the dependence coefficients. 
Under the assumptions of (1) in Theorem 15. 2[ let pen(r?T,) be defined by 



(5.11) 



pen(m) 



24aA(m) + 128a(l + 4^^^i/3x,i(A;) 



m 



n 
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Under the assumptions of (2) in Theorem 15 .21 let pen(m) be defined by 

24QA(m) , 64Q[l + 381n(m)](m + 7rELi^x,i(feK) 

(5.12) pen(m) = 1 

n n 

In both cases, the estimator g = g^J^ defined by ^6A\ satisfies H5.1U() . Remark 15.21 follows from the 
proof of Theorem 15.21 

5.4. Case without noise. One can deduce from Proposition 14. 11 Theorem 15. 21 its proof and Remark 
15.21 a result for density estimation without errors, on the whole real line, that is when the Xi is 
observed. If e = 0, then we can consider that Z = X and replace /* by 1. It follows that u^{Zi) = t{Xi) 
and the contrast 7„ simply becomes 

2 " 

(5.13) 7n,x(t) = ||tf 

1=1 

Let kn > n^, and consider as previously 

n 

(5.14) = arg min jn,xit), pen(m) = 128a(l + 4 V /3x,i(fc) 

ffe.Sm k=l 

and 

(5.15) m = arg min [7„,x(9m^) + pen(m)]. 

me{l, .■■>"} 

The following results follow straightforwardly. 
Corollary 5.2. Assume that e = 0. Let kn> n"^- Then 
(1) 

.(n)||2 / „„ „ „2 , "z(M2+3) , 2R, 



m 
n 



ng - 5^ Ir < lb - amW + ^ ^ + 



n n 

(2) If Px.,oo = 0{k^^^^^^) for some 9 > 3, then the estimator g = g^ defined by i5.14[ ) and 15. 1,^) 
satisfies 



mg - gf) < Ca inf \\\g - gmf + pen(m) + -""^^^ + 



m£{l,--- ,n} 



n 



C 

+ -, 
n 



where Ca is defined in \5. ^) and C is a constant depending on a and Sfc>o /5x,oo(fc)- 

The result (1) shows that if "^Z^^q [i'K,i{k) < oo, one obtains the same bounds (and the same rates) 
as in the i.i.d. case. However, if X]fe>o ''"x,i(^) < oo the term n~^Rm is of order n~^m^ and the rates 

for are less good than in the i.i.d. case. 

The result (2) shows that this estimation procedure also works in density estimation without errors. 
It allows to estimate a density on the whole real line and to reach the usual rates of convergence, by 
using a penalty of the classical order m/n. This remark is valid in the /3-mixing framework and in the 
case of independent XiS. We refer to Pensky (|1999l) and Rigollet ((2006,1 for recent results in adaptive 
density estimation on the whole real line in the i.i.d. case. 
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6. Proofs 

6.1. Proof of Proposition 14. IL The proof of the proposition 14.11 follows the same lines as in the 
independent framework (see Comte et al. ( 2006)). The main difference lies in the control of the variance 
term. We keep the same notations as in Section 13.31 According to (|3.2|) . for any given m belonging 

to {1, • • • ,?n„}, satisfies, ^n{9ni) — '^n{gni) < 0. For a random variable Y with density /y, and 
any function -0 such that ipiX) is integrable, let 

^ n 1 " 

(6.1) VnA^lj) = -Y.[<P{yi) - (^,/y>], so that UnAO = " ^(^0 " (*,5>] • 

i=l ^ i=l 

Since 

(6-2) 7n{t) - 7n(s) = lit - - \\s - g\\^ - 2z.„,z(n*_ J, 

we infer that 

(6.3) < lb-fi^^^f + 2j^„,z(n!,„, . 
Writing that — «m,j = ^n,z{u*ip^ ^)-, we obtain 

Vn,z{u*,^) („)) = {am,j - am,j)l^n,z{u*^^J = ^ Wn,z{u*^^jf ■ 

lil<fcn |i|<fc„ 
Consequently, E||5-5*^^f < \\g- g^^\? ^'lY.j&Myn^ziu*^^ ^))\ According to Comte et al. (t^M . 

(6.4) \\g - g^ f =\\ g-gmf+\\g^- g^ f<\\g-gmf +^^^^"^'^^ + 



Jm II II ^ II ' ii.:?/'t c7m II — II ^ ^"^ II ' 7 

The variance term is studied by using that for / G Li (M) , 

(6.5) yn,z{n = I z^n,z(e'"-)/(x)dx. 
Now, we use (|6.5p and apply Parseval's formula to obtain 

(6.6) EgK.K„J) j=j^gE(y^|j^.,.{. 
Since i/^^^ involves centered and stationary variables, 

EWnM^nf = Var|i/„,z(e'"-)| = ^ [X;Var(e"^'=)+ ^1 Cov(e"-^\ e"'^') I 

yfc=i i<fc^z<n y 

(6.7) = -Var(e"'^i) + 4t V Cov(e"^^ e*^^')- 

l<fc^«<n 

Since (A'j)j>i and (ei)j>i are independent, we have E(e*^^*) = f*{x)g*{x) so that 

Cov(e^^^^ e^^^O = E(e^^(^'-^^)) - |E(e^^^''-)|^ = E(e*^(^'-^^-)) - \f;{x)g*{x f. 
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Next, by independence of X and e, we write, for k ^ I, 
and consequently 

(6.8) Cov(e*^^^ e^^^O = Cov(e^^^^ e"^OI/*(^)l^- 

From ()6.7p . (|6.8p and the stationarity of (Xj)j>i, we obtain that 

1 2 



(6.9) E|i.„,z(e"-)l' < - + - E |Cov(e"^Se"^^)| |/; 



'x)\\ 



n n 

k=2 



The first part of Proposition 14. II follows from the stationarity of the Xj's, and from 1)6. 3() . 1)6. 4|) . ()6.6|) 
and (jlSI). 

Let us prove that -R^ < min(i?m,/3i -Rm,T)i where i?m,/3 and Rm,T are defined in Proposition 14.11 
Using the inequalities ()2.3I) and 1)2. 4() . we obtain the bounds 

|Cov(e"'^Se^^^'=)| < 2/3x,i(fc- 1) and |Cov(e"^i , e"^'=)| < |x|rx,i(A; - 1) 
(for the last inequality, note that t e*^* is |x|-Lipschitz). The result easily follows. 

6.2. Proof of Theorem 15.11 By definition, g satisfies that for all m G {1, • • • ,mn}, 

ln{g) + pen(m) < 7n(5'm) + pen(m). 
Therefore, by using 1)6. 2|) we get that 

\\g - g\? < Ibm^ - 511^ + '2.Vn,z{u* („)) + pen(m) - pen(m). 

9 9m 

li t = ti + t2 with ti in Sm^ and t2 in S^) , t* has its support in [— 7rmax(m, m'), 7rmax(m, m')] and t 
n. Set S„„/(0,1) = {t G 5^"^ 

max(m,m') m,m \ i j l ^ max(m,m') 



belongs to S'*'"'' / n. Set Bmm'i^A) = it & S^""^ , I ||t|| = 1|. For Vnz defined in (16.11) we get 

maxim. m' \ ; y l max m.mM / ii ii J "i^ ^ 



kn,z(u!_ („))| < ||5 - 5'm''ll sup |z^„,z(Uj)|. 
^ ^"^ tGB™,A{0,l) 

Using that 2uv < a^^v? + av^ for any a > 1, leads to 

\\g-gf < yin^ - 9\f + a''^\\9 - dm^W"^ + a sup {un,z(.ul)f + pen{m) - pen{m). 

tGB„,A{0,l) 

Now, according to Lemma m| write that iyn,z{'Ut) = i^n\t) + fn,x(i)5 where 

n n 

(6.10) z.W(t) = J;K(Z,) -E«(Z,)|a(X„ i > m=n-'Y.^n:{Zi)-t{X,)]. 

1=1 i=l 

Consequently, 

Wg-gf < \\gi^^ -gf + a-%-gi^^f + 2a sup (z.«(0)' + 2a sup (z^„,x(t))' 

(0,1) tes™,ri,(0,i) 

+pen(m) — pen(m). 
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Hence by writing that \g — |p < (1 + — 5|P + (1 + /^ajlb — 5m |P with Ka defined in H5.8() . 

we have 

< «^all5m^ - + 2aKa SUp (l^^^^ (t))^ + 2aKa SUp (l^n,x(i))^ 

(0,1) teB™,„-,(o,i) 

+Ka(pen(m) — pen(rfi)). 
Choose some positive function p(m, m') such that 

(6.11) 2ap{m,m') < pen(m) +pen(m'). 
For this function p{m, m') we have 

ll^-^ll^ < /«all5 - 9m^ll^ + 2KaPen(m) + 2aKa sup {un,x{t)f +'2^aKaWn{m,rh) 

t6B™,A(0,l) 

rrin 

(6.12) < K^llg -c/W||2 + 2Kapen(m) + 2aKa sup (i/„,x(t))^ + 2aKa X] "^')' 



where 
(6.13) 



Wn{m, m!) 



t6B™,A(0,l) 



sup \v'^\t)\'^ — p{m,m') 



m'=l 



The main parts of the proof hes in the two following points : 

1) Study of Wnirrijm'), and more precisely find j»(m, m') such that for a constant Ai, 



(6.14) 



J2 IE(W„(m,m')) < 



m'=l 
^2 



n 



2) Study of supfg^^ ^(o,i)(^n,A:(i)) and more precisely prove that 

■nin + i?r, 



(6.15) 



E 



sup (fn,x(i))^ 

'-te-B^,^{o,i) 



< 



''n ^"-mn 

n 



where -Rm is defined in 1)4. 1|) . Combining 1)6. 12() . 1)6. 14() and (|6.15j) . we infer that, for all 1 < m < m„ 

Wil ~||2 ^ 2|| (n)||2 ,0 ^ ^ , 2aKa(m„ + i?„„) 2aKaAi 

Ells' - S-ll < ^IIS' - S'm 11 + 2Kapen(m) H \ . 

n n 

If we denote by Ca = max(K^, 2Ka), this can also be written 

wll ~I|2 ^ r» • f ril (n)||2 , II (n) || , / \1 , ^^'^ayLmn + Rrn„) , 2aKaAi 

Ells' -5-11 < C'a inf 1115-5^^11 + Il5)n^ " 5m|| + pen(m)J + + 



n 



< Ca mf 5 -5m + (M2 + l)m + pen(m) H \ . 

me{l,--- ,m„} J :k, ^ 



Proof of (|6.14p We start by writing E(VF'„(m, m')) = Elsup^g^ ,(0^1) \i'n (OP — p(m, m')]+ as 



n n 
(i)mi2 



e|Ex[ sup |z.«(t)|2-p(m,m')l |, 



teB„,,„,{o,i) 

where Ex(^) denotes the conditional expectation K{Y\a{Xi, i > 0)). The point is that, conditionally 
to o"(Xi, i > 0), the random variables Ui{Zi) — E(uj (Zj)|cT(Xj, i > 0)) are centered, independent but 
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non identically distributed. We proceed as in the independent case (see Comte et al. (|2()()6jl ). by 
applying the following Lemma to the expectation Exisup^g^ ,(o,i) I'^n^^COP — p(m, m')]+. 

Lemma 6.1. Let Yi, . . . ,Yn be independent random variables and let J- be a countable class of uni- 
formly bounded measurable functions. Then for > 



E 



sup|z.„,y(/)|^-2(l + 2e2)/72 



< 



—e ^'^ V -\- 



98Mi _2£l£(i)inH. 
^ -g 7v^ A/i 



with C(0 = Vl + - 1> Ki = 1/6, and 



sup \\j iioo 



< Ml, E 



sup |fn,y(/)| < H, sup - Var(/(yfc)) < v. 



k=l 



The proof of this inequality can be found in Appendix. It comes from a concentration Inequality in 
Klein and Rio (|2()()5j) and arguments that can be found in Birge and Massart (1998 ). Usual density 
arguments show that this result can be applied to the class of functions J- = Bm.m'{^^^)- Let us 
denote by m* = max(m, m'). Applying Lemma IH.H one has the bound 



Ex 
where 



sup |4i)(t)|2-2(l + 2a^' 
'*eiJ^w(o,i) 



< 



6 I V ly, f2 nH^ 



sup ||nn^i)||oo < Ml, Ex 

*G-B„,m'(0,l) 



98M? 

-g ^^/2 Ml 



1 



sup \K^'{t)\ <H, sup - Varx(n*(Zfc)) < 



V. 



By applying Lemma EH we propose to take 



H'^ = H'^{m*) 



A(m* 



n 



-, Ml = Mi(m*) = Vni72 and v = v{m*) 
with, for fz denoting the density of Zi, 

(6.16) A2{m,h)= / :i^,\J'L dxdy. 



^A2(m*,/i) 
2tt 



J —irm J —nm Ife i-^^fe (l/)! 

From the definition 1)6. 13(1 of Wn{m,m'), by taking p{m,m') = 2(1 + 2^^)/7^(m*), we get that 

(6.17) E(VF„(m,m')) < e|Ex[ sup \ul^^\t)\'^ - 2{l + 2f)H^{m*)\ }. 

According to the condition (|6.11() . we thus take pen(m) = 4ap(m, m) = 8n~-'^a(l + 2S,'^)A(m) where 
is suitably chosen in the control of the sum of the right-hand side of 1)6. 17() . Set mo such that for 
m* > ruQ 

(6.18) (l/2)Ai(/e,4)r(m*) < A(m*) < 2Xi{f„ Ko)r{m*) 

where T{m) is defined in (|4.2|1 and Xi{fe, kq) and Xi{fe, h'q) are defined in 1)4. 3|) . We split the sum over 
m' in two parts and write 

(6.19) ^ E(W„(m,m')) = ^ E(W„(m,m'))+ ^ E(VF„(m, m')). 

m'=l m'\m*<mo m'\m*>mo 
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By applying Lemma FTTI and (|(i.l8|) . we get the global bound Ex(W„(m,m')) < K[I{m*) + //(m*)], 
where I{m*) and II{m*) are defined by 

l(m ) = exp < — ivi4 



n y v[m*) 

and //(m*) = ^l^expj-^^^^^^V^ 

Since / and // do not depend on the Xj's, we infer that E(l^„(m, m')) < K[I(m*) + II{m*)]. 
When m* < mo, with mo finite, we get that for all m G {1, • • • , mn}, 

y E(VF„(m,m'))<^^. 

m'\m*<mo 

We now come to the sum over m' such that m* > mo- 



When (5 > 1 we use a rough bound for A2(m, /i) given by A2(m, /i) < 2imH'^{m). 
When < (5 < 1, write that 

A2{m,h) <\\ |/*r^ll[_^m,7rm] lloo A(m) II h* f {2tt). 

Under we use that \\h*f < \\f*f < oo, that \/2^||/*|| = \\fe\\ and apply dlTTKl) to infer 

that for m* > mo, 



(6.20) t;(m*) = ^^^^""''^^ < A2(/e, ^o)r2(m*), 
where A2(/£, kq) is defined in ()5.6() and 

(6.21) r2(m) = (1 + (7rm)2)T(7rm)™'^«i/2-5/2),(i-5)) exp(2/i(7rm)'^) = (7rm)-(i/2-5/2)+p(^)^ 
Combining 1)6. 18() and ()6.2U|) . we get that for m* > mo, 

^(^*) < A2(/e,K0)r2(m-)^^ r_E:,g^A,(/.,^-0)(^^.^(V2-./2), 

n [ 2A2(/£,Ko) 

Aim*) r 2Kiec(e)v^i 



and II{m*) < — — exp 



7V2 S 



• Study of J2m'\ni*>mo Hi'm*)- According to the choices for v{m*), H'^{m*) and Mi{m*), we have 

E "(m) < E— »p| — — I 

m'|m*>mo m'=l 

Since under ()5.7p . n~^A(m„) is bounded, we deduce that Ylm'\m*>mQ < n^^C. 

• Study of Ern'\m'>mo ^("^*)- denote by V = 27 + min(l/2 - 6/2, 1 - 6), oj = (1/2 - 6/2)+, and 
K' = KiXi{fir, k'q) / {2\2[fe, Ko))- For a, 6 > 1, we have that 

max(a,6)'^e^'"'*™''''('''''^%"'^'^'' < (o'^e^'"'*"* _^ ^Vg2M^^6'')g-(i^'eV2)(a"+6") 
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Consequently, 



J2 limn < ^ A,(/,,Kojr,(m-) i_K,UXi{fs,^'o)^^^.^a/2-S/2U 

m'|m*>mo m—1 



m'=l 



n 
m'=l 



Case < 5 < 1/3. In that case, since 6 < (1/2- 6/2)+, the choice = i 

ensures that the quantity 

r2(rn.) exp{ — (i^'^^/2)(m)("^/^~^/^^} is bounded, and thus the first term in (|6.23|) is bounded by C/n. 
Since 1 < m < m„ with m„ satisfying n'^ Em"=i ^^{m') exp{-{K' /2){m')^^/'^-^/'^^} is bounded 

by C/n, and hence X]m'|m*>mo -^("''*) — Dn"^. According to ()6.1ip . the result follows by choosing 
pen(m) = Aap{m,m') = 2Aan~^ A(m). 

Case (5 = 1/3. According to (lOHll . we choose such that 2ii-K\mf - {K'(,^/2)m^ = -2^i{'nmf that 
is = (SuTT^ \2{fe, fio))/{KiXi{fe, k'q)). Arguing as for the case < 6 < 1/3, this choice ensures that 
Z^m'|m*>mo -^("^*) — ™d Consequently (|(i.l4j) holds. The result follows by taking p(m, m') = 

2(1 + 2^2)A(m*)n-\ and pen(m) = 8a(l + 2^2)A(m)n"^ 

Case 5 > 1/3. In that case J > (1/2-5/2)+. Choose ^'^(m) such that 2fnT^{m)^-{K'^'^ /2)m'-^/^'^^+ = 
-2fj.TT\mf. Hence ^2^^) ^ (8/i(7r)'^A2(/ e, Ko)/ (KiAi(/e, K[)))(7rm)^~(^/^''^/^)+ . This choice ensures 
that X]m'|m*>mo -^("''*) — ^/''^' ^'^^^ (|6.14|) holds. The result follows by choosing p{j 
2(1 + 2^2(m*j)A(m*)/n, associated to pen(m) = 8a(l + 2^2(m))A(m)/n. 

Proof of (|6.15p . Since max(m, m) < m„, according to 1)6. 5jl . 

sup E(z.„,x(t))' < sup Ef;l /"z.„,x(e'">*(-^)c^^l 
teB„,rf,(o,i) te5„„,||i||=i V^^./ / 

< 



Im, m') 




< -^^^ I > '|Cov(e*^^Se^^^M|(ix 

and Theorem 15. II is proved. □ 

6.3. Proofs of Theorem 15.21 (1). We use the coupling argument recalled in Section I7!T] to build 
approximating variables for the Xi's. For n = 2pnqn + < r„ < and i = 0, - ■ ■ ,Pn — 1, denote 

by 

El = {X2eq„ + 1, ■■■,X(^2e+l)qJi Fe = iX(^2e+l)qn + l, ■■■,X(2e+2)qJ, 

TP* ( V* Y* \ T?* / V* Y* \ 

J^l - {^2eq„+l^ ■■■^^{2e+l)q„)^ " (2^+l)g„+l ' ' " ' ^ (2<!+2)g,J • 

The variables and are such that 

- E^, El, En and Ei are identically distributed, 
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- F{Ei / E*) < Px,oo{qn) and P(F^ / F/) < (3^,oo{qn), 

- The variables (F^ )o<£<p„-i are i.i.d., and so are the variables (F^*)o<£<p„-i. 

Without loss of generality and for sake of simplicity we assume that r„ = 0. For Kq defined in H5.8() . 
we start from 

Wg-gf < K^II^W _^||2 + 2aK„ sup (i/W(t))2 + 2aK„ sup 

(0,1) teB^,™(o,i) 

+Ka(pen(m) — pen(m)) 

< ^all5^^ -5f + 2aKa sup {iy^^\t)f + AaKa sup 

(0,1) teB„,A(o,i) 

+4aK sup (fn,x(i) - K + Ka(pen(m) - pen(m)), 

teB™,A(o,i) 

where (t) is defined as I'n^xi't) with X* instead of Xj. Choose pi(m, m') and P2{m, m') such that 

2api{m,m') < [pen]^(m) +pen;^(m')] and Aap2{m,m') < [pen2(m) +pen2(m')], 
for pen(m) = peni{m) + pen2(m). It follows that 
\\g - gf < /balls' - S-m^lP + 2KaPen(m) +4aK;aTy*^(m,m) +4aKa sup {i^n,xii) - K x{i)f 

t6B,„,A(0,l) 

+2aKaWn{m,m) 

rtin m,„ 

(6.24) < Kl\\g - + 2Kapen(m) + 4aKa^ + 2aKa ^ Ty„(m,m') 



m'=l 



m'=l 



+4aKa sup (i^n,x(i) - J^i^xCO)^) 
teB™,A(o,i) 



where 
(6.25) 

(6.26) 



Wn{m,m') :- 



sup \iy!;^\t)\'^ — pi{m,m') 
teB,„,„,{o,i) 

sup \iy*^x{t)\^ - P2{m,m) 
tGfl,n,™'{o,i) 



The main parts of the proof lies in the three following points : 

1) Study of Wn{m,m'). More precisely, we have to find pi{m,m') such that for a constant A2, 



(6.27) 



E{Wn{m,m)) < 



m'=l 



n 



2) Study of W^* j(^(m, m'). More precisely, we have to find p2{rn,m') such that for a constant A3, 

^3 



(6.28) 



Y,nWlx{m,m'))< 



m'=l 



n 



3) Study of supjg^^ ^(o,i)(^ra,A:(i) — v*^ more precisely we have to prove that 



(6.29) 



E 



1 ^4 
sup (fn v(*) - Z^n,x(i))^ < 4/?x,oo (Q'n)?7ln < 



t6B™,A{0,l) 
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Proof of ()6.27j> The proof of (|().27j) for ordinary smooth errors (5 = in ( |A|| )) is the same as the 
proof of (|6.14() by taking pi{in,m') = p{m,m'), with p(m,m') as in the proof of H6.14() and = 1. 
Hence we choose peni(m) = 24an^^A(m). 

Proof of (|6.28|) We proceed as in the independent case by applying Lemma f6.11 Set m* = max(m, m'). 
The process VF*j(^(m,m') must be spht into two terms (VF* j^(m, m') + Ty*2 involving 
respectively the odd and even blocks, which are of the same type. More precisely W* ■^{m,m') is 
defined, for k = 1,2, by 



(m, m) 



sup 

'<eB^,„,(o,i) 



Pn Qn 



Pnqr 



EE '(^ 



=1 i=i 



)-{t,9)) -P2,k{m,m') 



We only study W* ^ j^{m,m') and conclude for W*2xi''^^''^') using analogous arguments. The 
study of W* ^ (m, m') consists in applying Lemma IHTTl to i^* ^ -^{t) defined by 

<,i,xit) = — 5Z^9n/,x(i) with = —^ii^2eq„+j) - {t,9), 



Pn 



considered as the sum of the pn independent random variables z^*^ e x(^)- Denote by M^(m*), H*{m*) 
and v*{m*) quantities such that 



sup 

*e-B„ ,(0,1) 



< Mt{m*), E( sup Kxxm<H%m*) 
■*eB„w(o,i) 



and 



sup Var(<,,,^(t))<^'^(m*). 



Lemma 17.51 leads to the choices 

'l + 4ELi/3x,i(fc) 



ieB„.,,„,(o,i) 



m 



8 Er=o(^ + l)/3x,i(fc)||5l|oom* 



1/2 



qn 



{H\m*)Y = ^ — ^ , and v*{m*' 

n 

Take ^^(m*) = 1/2. We use that for m* > mo, 

2(1 + 2C\m*)){H*{m*)f = A{H''{m*)f < A(m*)/(4n). 
Then we take P2,iirn,m') = A(m)/(4n), and get that 

^ E«,i,^(m,m')) = Yl nWli,xim,m'))+ ^ E(Ty* i,^(m, m')) 

m'=l m'\m*<mo m'|m*>mo 

< 5^ e[ sup |<,,^(t)p-4(/7'^(m*))2l 
+ \P2iim,m')-iiH^m*)f\ 

m'\m*<mo 



+ Y E 

m'\m*>mo 



sup 

t<-B ,(0,1) 
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It follows that 



+ 



Y,nWliAm,m')) < 2J]e[ sup - 4(//*(m*))2 

m'=l m'=l *e-B^ ,„/(0,l) 

+ \P2,i{m,m)-A{H*{m*)f\ 

m'\m*<mo 
rrin 

< 2j]E sup |<i,^(t)|2-4(i/'^(m*))2 
^,=1 LieiJ,„^„,(o,i) 

We apply Lemma IHTTl to E sup^g^ ,(o,i) i ~ 4(//*(m*))^j and obtain 

E l<i,xWl' -4(i?^K))'l < ^ E r{rn*)+ir{m*)], 

with I*{m*) and II*{m*) defined by 

r{m*) = exp I - K2Vm*j and //* 



2 * 



C7(mo) 



n 



m'=l 



.m 



■ exp 



I 7 J 



where = (i^i/32)(l + 4 ^Li /3x,i(fc))/^||5||oo Er=o(^ + l)/5x,i(A:). 
With our choice of ^^(m), if we take g„ = [n^], for c in ]0, l/2[, then 



yi{m*) < -, and V < -. 

m m—1 



Finally 



and 



sup \ul,^At)\' - ^H^m*)f 



< 



n 



C 

n 



E E[W;^(m,m')] < 2 ^ E[Ty;i,^(m, m') + Wl,^x{m,m')] < 

m'=l m'=l 

The result follows for choosing p2{m,m') = 2p2^i{m,m') + 2p2,2(W') w-') = ^{m)/n, and pen( 
25aA(m)/n. 

Proof of (|6.29p . A rough bound is obtained by writing that 



sup \K,x{t) - ^n,x{t)'\'^ 



sup 



*e-B™,™(o,i) 



(n) 



lUIKi 



max (m, 771 ) ' 

< sup \K.xif) - ^n,x{t)\'^ ■ 

t65„„,||t||<l 



According to ()6.5|1 . 



<x(t)-^n,x(t) = ^ /[<x(e'^-)-^^n,x(e^^-)]r(-x)dx. 
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Since |^'n,x(e*^') — '^n,xi^^^ )\ — 2' h.aYe 

1 



sup \K,xit) - ^n,x{t)\ < ™p 

(0,1) tes,nn,M\<'i-^^ 

fnm„ 



1 i-jrmn 
J-TTm„, 



-nmn 
r-irrrin 

< - I \K,x{en-^n,x{en\dx. 



1 



According to the properties of the couphng, 



E 



tGB,„,A(0,l) 



1 



vr 



7rm„ 



sup \K,xit) - '^n,xit)\ < - / ^11^*^(6*"^-) - Z^„,x(e"')|f^a; < 4/3x,oo(gn)"ln. 



For ordinary smooth errors, according to ()5.7p . m„ < ni/(27+i). It follows that if we choose g„ such that 
/3x,oo(9n) = 0(n-(27+2)/(27+i)), then /3x,oo('7n)mn = O(n-i). For = [n^] and /3x,oo(n) = 0(n-i-^), 
we obtain the condition n-<'^+^^ = 0(n-(27+2)/(27+i))_ jf > (27 + 3)/(27 + 1), one can find c < 1/2 
such that this condition is satisfied. 

6.4. Proofs of Theorem 15.21 (2). We proceed as in the /3-mixing case, by using the coupling ar- 
gument given in Section I2.1L The variables Ei, , Fi, are build as in Section and are such 
that 

- E'^, Ei, F^ and Fn are identically distributed, 

- ^^i\X2iq,,+i - ^2£g„+il) < qnTyi,ooiqn) and ^E{\X^2e+l)q„+i - ^(Vl)g„+*l) ^ 9nTX,oo(9n), 
i=l i=l 

- The variables (-E'^)o<^<p„-i are i.i.d., and so are the variables (i^/)o<^<p„--i- 

Without loss of generality and for sake of simplicity we assume that r„ = 0. As for the proof of 
Theorem 15.21 under 2), we start from 1)6. 25() . Hence we have to : 

1) Study of Wn{m,m'), and more precisely in finding pi{m,m') such that for a constant K2, 

K2 



(6.30) ^ E(iy„(m,m')) < 



n 

m—l 



3, 



2) Study of W* ■^{'m,'m'), and more precisely in finding p2("^) "^') such that for a constant K- 

mn jy- 

(6.31) Y,nWlx{^,m'))<^. 

m'=l 

3) Study of supjgB„^(o,i)(fn,x(t) - K,xi'^)f and more precisely in proving that 

(6.32) E[ sup (Kxit) - '^n,xit)f] < TTT^,ooiQn)mn^ < —■ 

Proof of ()6.30j) The proof of (|6.30|) for ordinary smooth errors is the same as the proof of (jHHH). 
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iH*im*)f 



n 



and v*{m*) 



m 



Qn 



We take = f(^rn) = {3/Ki + l)ln(m). In the same way as for the proof Theorem I5.2r i). we use 
that for m* > niQ, 

2(1 + 2f{m*)){H*{m*)f < A(m*)/(4n). 
Then we take p2i{m,m') = A{m*){4n)^^ and get that 

5^E(W„V(m,m'))<2 J;e[ sup |<i,^(t)|2 - 2(1 + 2e2(m*))(i/*(m*))2l ' 

^,=1 L4ei?,„^„/(o,i) 



m'=l 



+ 



n 



We now apply Lemma lO to E^suptg^^ ^,(o,i) \Kxxi^)? " ^(1 + 2^'^{m*)){H* {m*)f 



and obtain 



1 1 I'll 1 1 I'll 

Ki,x(i)l'-2(l + 2C2(m*))(i7^(m*))2j < K ^ (m*) + ir (m*)], 



m'=l 



with I*{m*) and II*{m*) now defined by 



*2 



r(m*) = exp{-i^ie^(m*)} 



and II*{m*) 



n 



■ exp 



V2i^ieC(0(l + vrELi^x,i(A;)j ^ 



With this C^(m), if we take g„ = [n'^], with c in ]0, l/2[ then 

V/(m*)<- and V //(m*) < -. 

m' m'=l 

Finally EZ'=i HW,%m, m')] < 2 EZ'=i ^i^li^xi^^ "^') + W^^,2,x("^> "^')] < Cn-\ The result follows 
by choosing p2{m,m') = 2p2i{m,m') + 2p22{rn,m') = A{m)n~^, and pen(m) = 25aA(m)n~"'^. 

Proof of The proof of (lO^ is similar to the proof of (UTT^ . Since je"'^* - e"*^''! < - s|, 

one has 



^E(|e- 



i=l 



It follows that 

E 



sup Wn,xii) - '^n,x{t)\'^ 

'-teB,„,™(o,i) 



'qn+i 



1 

< 



7rm„ 



For ordinary smooth errors, according to 1)5. 7|) . < n^/(^'^+^). It follows that if we choose qn 

such that rx,oo(9n) = 0(n-(27+3)/{27+i))^ then Tx,oo(gn)m2 = O(n-i). For g„ = K] and Tx,oo(n) = 
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0{n-^-^), we obtain the condition n-'=(i+^) = 0(n~(27+3)/(27+i))_ j£ ^ ^ + 5)/(27 + 1), one can 
find c < 1/2 such that this condition is satisfied. □ 

6.5. Proof of Corollary 15.21 The result foUows from the proof of Theorem 15.21 (1). where only the 
process i'n,x appears. □ 



7. Technical lemmas 
Lemma 7.1. If we denote by i'n,xit) the quantity defined by i6.1\) . then 



n 



-1 ^E(n*(Zfc)|a(Xi, i > 0))- <t,g>= Vn,x{t)- 



k=l 



The proof of Lemma l7. 11 rather straightforward, is omitted. 
Lemma 7.2. Let i^n,z{ut) be defined by \6.1\) . A{m) being defined in / l,9.,5)) . Then 



f* (xm) 



dx = A(m). 



Lemma 7.3. Let iyn,z{ui), A{m) and A2(m, /i) be defined in 116.1]) . 11^.5]) and in 116.16]) . Then 



sup II u1 ||oo< vA(m*) E[ sup Wn.z{ut)\] < y/A{m*)/n, 



and sup Var(ui(Zi)) < ^/A^(m*J^) / {2tt) . 
teB^^^, (0,1) 

We refer to Comte et al. (|20U6I!) for the proofs of Lemmas 17.21 and 17.31 



Lemma 7.4. || Y^ja^Wm,]? ||oo< m. 
Proof of Lemma 17.41 Write 



e-^--^*^^^{u)du 



m 



(2»)- 



E 



We conclude by applying Parseval's Formula which gives that 

^^\ipm,j{x)\'^ = {2Tr)^^'m J \lp* {u)\'^ du = m. 



Lemma 7.5. For Bm,m'{0, 1) = {t £ S„i\/m' I PII2 = 1}, we have, for m* = m V m' , 



sup II t ||oo< Vm*, ]E[ sup Wn,i,x(.t)\] < 

*6S„,„,(0,1) teB„.„,(0,l) 



;i + 4ELi/3x,i(^))"i* 



n 



and sup Var(z^^^^^^^(t)) < 

«GB„,„,/(0,1) 



[2||5||oo(l + 32ELi(l + fe)/5x,i(A;))] 



1/2 
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Proof of Lemma 17. 51 For t in Bm,m'iO, 1)) with m* = m\/m', one has t = ^ji=z bm*,jVm,*,j- Applying 
Cauchy-Schwarz Inequahty and Lemma 17.41 we obtain 



sup 

tGB ,(0,1) 



t 1 1 oo ^ ^ ^ I V'm* ,j I 



1/2 



3& 



Now, using again Cauchy-Schwarz Inequahty 



E 



sup \VnXx{^)\ 



< E 



- .EVa^(<l,x('/'m*,j))- 



By analogy with (|6.6p . we write 



2 /"Trm 



This yields 



E 



sup VnXX^^^\ 

*e-B„ ,(0,1) 



< 



:i + 4ELi/?x,i(A;))m* 



n 



Finally, we apply Viennet's (|1997|1 variance inequality (see Theorem 2.1 p. 472 and Lemma 4.2 p. 
481). Hence there exist some measurable functions 6^, such that < 6^ < 1 and E 
X]fc>i(l + k)(3:s.,i{k), for which 

sup yar{vq^^e,x{t)) < sup — [ i 1 + AY^h] t'^{x)g{x)dx . 



Consequently 



sup Var(fg„^^_x(*)) < sup — ||t||oo||5 

*6B„w(0,l) t6iJ„w(0,l)9n 



,1/2 
loo 



1/2 



< 



971 



2||g|U(l + 32^(1 + A:)/3x,i(A:))- 



k=l 



Proof of Lemma 16.11 : Starting from the concentration inequality given in Klein and Rio (|2UU5|1 
and arguing as in Birge and Massart ()TD98i) (see the proof of their Corollary 2 page 354) we obtain 
the upper bound 



(7.1) 



sup|i/„(g)| > (1 + 7?)F + A < 2exp 



-K^nl — A 



2A(r/Al) 



V 7Mi 
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where i^i = 1/6. By taking rj = (vT+~e — 1) A 1 = C{e) < 1 we get 

E[sup|i^„(5)p-2(l + 2e)/f2]+ < /^P (sup|i^„(5)P > 2(l + 2e)ij2 + r I dr 
geG Jo \geg J 

< rrlsuplunig)] > V2(l + e)i^2 + 2(ei72 + r/2) ) dr 
Jo \g£g J 



< 2 



/^P f sup|z/„(5)| > V(l + e)iJ+ 76/72 + ^/2 I dr 

Using that for any positive constant C, /q^°° e~^^dx = 1/C and /o^°° e~^^dx = 2/0^, we get that 
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